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Abstract 

The classical Erdos-Ko-Rado (EKR) Theorem states that if we 
choose a family of subsets, each of size k, from a fixed set of size 
n (n > 2k), then the largest possible pairwise intersecting family has 
size t = ClZ])- We consider the probability that a randomly selected 
family of size t = t n has the EKR property (pairwise nonempty in- 
tersection) as n and k = k n tend to infinity, the latter at a specific 
rate. As t gets large, the EKR property is less likely to occur, while 
as t gets smaller, the EKR property is satisfied with high probability. 
We derive the threshold value for t using Janson's inequality. Using 
the Stein-Chen method we show that the distribution of Xq, defined 
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as the number of disjoint pairs of subsets in our family, can be ap- 
proximated by a Poisson distribution. We extend our results to yield 
similar conclusions for JQ, the number of pairs of subsets that over- 
lap in exactly i elements. Finally, we show that the joint distribution 
(Xq,Xi, . . . , X\j) can be approximated by a multidimensional Poisson 
vector with independent components. 

1 Introduction 

The classical combinatorics literature is replete with fundamental results on 
properties of intersecting families of sets and subsets of fixed element sets. 
Results in this genre include Sperner's theorem jH], Kneser's theorem [S] and 
the starting point of this paper, the Erdos-Ko-Rado theorem. In 1960, Erdos 
and Rado proved that for each pair of positive integers n and k, with k > 2, 
there corresponds a least positive integer !F(n, k) such that if T is a family 
of more than T{n, k) sets, each set with n elements, then some k of the 
sets have pairwise the same intersection. Together with Ko in 1961, they 
produced the Erdos-Ko-Rado (EKR) Theorem [3], which states that if T is 
a pairwise intersecting family of /c-element subsets chosen from an n-element 
set with n > 2k then < (^ij), where, throughout this paper we denote 
the cardinality of a set A by \\A\\. Note that a maximal ensemble of this type 
may be constructed by selecting all fc-subsets that contain a fixed element a. 

In this paper, a family of sets defined to have the "EKR property" is 
a group of fc-sized subsets chosen from an n-element set such that there 
exists a nonempty intersection between any two subsets. We address the 
following threshold-type question: as n and k tend to infinity, how many k- 
subsets will we be able to choose (at random, using the uniform probability 
measure on fc-sets) such that the EKR property is "almost always" or "almost 
never" satisfied, where these terms are used in the graph-theoretic rather than 
measure theoretic sense? 

Let R denote our family of fc-sets, so that = t is the size of our family 
of fc-sets. We will let Xq be the random variable representing the number of 
disjoint pairs in our selection of t subsets. X = corresponds to no disjoint 
pairs being in our family, i.e., to this family of sets having the EKR property. 
We will, accordingly, often use the notation P(EKR) instead of f(X = 0). 

In Section 2, we use Janson's exponential inequalities, as found, e.g., in 
PP, to prove asymptotic threshold results of the form 
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P(EKR) -»• 1 (t < t ) and P(EKR) ^ (t > t„), where, throughout this 
paper, we write a n <C 6 n or 6 n ^> a n if a n /6 n — > (n — > oo). In Section 
3, we will examine the distribution of Xq and use the Stein-Chen method of 
Poisson approximation [2] to prove that <1tv{£(X ), Po(E(X )) — > as n, = 
fc n — > oo at an appropriate rate, where represents the distribution of 

the random variable Z, cItv the usual total variation distance, and Po(A) 
the Poisson distribution with parameter A. The generalization of the EKR 
property alluded to in the abstract will be provided in Section 4, where we 
present asymptotic results on the existence and numbers of pairs of fc-sets 
that overlap in exactly r elements. Finally, in Section 5, we discuss the joint 
distribution of the ensemble (X , X 2 , ■■■,X b ) where for i = 0, 2, . . . , b, Xj is 
the random variable representing the number of pairs of fc-sets which overlap 
in exactly i elements. 

We end this section with a few potential applications: Suppose that at 
an international event, there is a need for interpreters to be hired. If n is 
the total number of languages spoken amongst the t interpreters who will 
be present at such an event and if each interpreter speaks k languages, the 
results of this paper could be used to determine thresholds for t so that any 
two interpreters can converse with each other. Similarly, we may consider a 
workshop with t participants, each of whom is randomly scheduled to attend 
k sessions out of a total of n. The EKR property would suggest that any 
pair of participants could have a meaningful dinner conversation, while the 
results of this paper would enable one to derive probabilistic conclusions 
along the same lines, such as the following: What value of t would ensure 
with probability at least 0.95 that at least x pairs of participants can only 
talk about a single session that they both attended, and that between y and 
z pairs of participants find that they attended between two and five common 
sessions? 

2 Threshold for the EKR Property 

Intuitively, one would imagine that for appropriately chosen values of n and k, 
a small number of randomly chosen fc-sets would allow the EKR property to 
hold with high probability, whereas even a "slightly" larger collection would 
cause the pairwise intersection property to be ruined. We make this precise 
in the following result, which makes use of Janson's exponential inequality 

n 
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Theorem 1 Let t denote the number of k-sets chosen at random from an 
n-element set. We set 

x o= ^2 

3=1 

where Ij equals 1 if the jth pairwise disjoint pair is in the selected ensemble 
and Ij = otherwise. Then with t = ^ 

P(X = 0)^ I «/t>t 

{ e~ A2 ift = (A + o(l))t 

as n,k — > oo ; provided k ^> y/n, k = o(n). If k = o(n 2 / 3 ), we may use the 
more convenient t = \/2e^ in the above result. 

Proof We start by altering our model slightly and choosing each k set inde- 
pendently with probability p — ■ I* 1 other words, we flip a coin with bias 
p = £/("), to decide whether each of the (™) subsets will be in our family 
of fc-sets. We thus obtain a random collection R of /c-sized subsets where 
E(||i?||) = t. Janson's inequality, which bounds the probability that none 
of a sequence of "undesirable" events B { \ i e / occurs, asserts that under 
certain fairly general conditions 

n < p (n^) ^ ex p + rbf ) 

where P(-Bj) < e for each i, /i = ^ ig j P{Bi), and, for ~ to be defined below, 

A = J]p(s i ns J -). 

Let i?j be the event that the ith disjoint pair of subsets is in our selection 
of fc-sets. It follows that P (f) ieI B~i) = P(Selecting a family with the EKR 
property) = P(X = 0). Following the canonical set-up for the validity of the 
Janson inequality, we say i ~ j if the ith and jth disjoint pairs of fc-sets have 
one set in common. The probability P(-Bj) that a particular pair of disjoint 
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sets is in our selection of A;-sets is p 2 . Since there are \{ a t )){ n k k ) possible 
disjoint pairs, it follows that \i = E(X ) = \ {^) ( n l k )p 2 and that 

1 ^ ^n-k^ 

2 n w=(i-p^3(") 
i=i 

Using the inequality 1 — x > exp { — y^}, the lower Janson inequality yields 



so that P(EKR) -> 1 when p = o ^2/(£) (V)) or equivalently when 

* = ^\/ 2 (fe)/( n fe fe )^- Note: Tnere is a simpler proof of this fact using 

Markov's inequality, but we have presented the above proof for uniformity of 
exposition. Also, we have assumed above, as we will throughout this paper, 
that p — > 0. 

In terms of an equivalent (and more convenient) exponential bound, we 
have, since Vte? < y / 2g)/(^), that 



P(EKR) -> 1 when t < \J 2e^ . 

Let us now see when when the upper bound in Janson's inequality tends to 
0, thus yielding P(EKR) -> 0. We have 

so that the upper Janson inequality yields 

'-Kx> 2 a-p 2 -r»' 



< 



exp 



1 — p 2 



We note that this upper bound tends to zero when t — p(^) satisfies 



Assuming, as stated in the theorem, that k 2 3> n and k -C n, it is easy to ver- 
ify that the above range for t is a valid one, i.e., that ^2(1) /( n ~ k k ) is indeed < 
(k) /2{ n ~ k k ) We conclude by monotonicity that 



P(X = 0) -> when t > 



2 a) 



(V) ' 

It is easy to verify using the inequality 1 — x > exp{— xj (1 — x)} that 



( n ~ k ) { k 2 2k 3 



k 

so that P(EKR) — > when t 3> \/2e^ , provided that = o(n 2 / 3 ), as asserted. 
Next, we examine the behavior of the EKR property around this threshold 



value. If we let t = (A + o(l))y (£) / ( n k k ), and let n and k go to oo, the lower 

and upper bounds of the Janson inequality together yield f(X = 0) ~ e~ A2 , 
proving the last part of the theorem, but for the altered model, i.e. when we 
expect to chose t = p ■ (™) subsets. It remains to "derandomize" our results, 
so as to verify that the same threshold is valid when exactly t subsets are 
chosen. We proceed in a fashion similar to that in 0j. 

First we derandomize the result corresponding to P(X = 0) — > 0. Let 

\\R\\ be the exact size of the chosen family. Assuming that p ^> ^2/(^) ( n ~ k k ), 

we wish to prove that if = pQ) fc-sets are chosen, then P(X = 0) — ► 0. 
We have by monotonicity 



P X = 



\R\ 



P 



n 



< P X 



< 



\\R\\ < 



p 



P(X = 0) 



R\\<pQ) 

< 3P(X = 0) -»• 0, 



by our preliminary result, the fact that P(A|S) < ¥(A) /¥(B) and the fact 
that the central limit theorem (or the approximate and asymptotic equality 
of the mean and median of the binomial distribution Bin(n,p)) implies that 
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P(Bin(n,p) < np) > 1/3. It follows that 



P X = 



11*11 =P 



n 



if p > 



GKV)' 



or 



P X n = 



(V) 



o. 



To derandomize the case where P(X = 0) — > 1, we proceed as before: 



P ( X > 1 



n 



< P X > 1 



< 



\\R\\ >p 



n 



P(X > 1) 



n\\R\\>p(D) 

< 3P(X > 1) -> 



if p < y/VQ) ( n ~ k k )- It follows that 



P [Xn = 



pii =>■ f « j^S 



i, 



as required. 

Finally, we know that when p = (A + o(l))y / 2/Q) ( n ~ fc ) (where A > is 

a constant), P(X = 0) — > e~ A2 . We define, with hindsight (but somewhat 
arbitrarily) , 



,+ 



and 



p 



P = 



('-{G)/("i')} 5/16 ) 



where t = (A + o(l))y / 2(^)/( n fc fe ). Note that both p+ and p~ are of the form 
(A + o(l))^2/ (™) Using first p + as the probability of picking any fc-set 



thereby obtaining a random collection R + , we see that 



E(\\R* 
Note also that 



=m:)-(a + ^c;/"; 4 



Var(||i?+||) = E(||i? + ||)(l - P +) « E(||i?+||) ~ (A + 0(1))^ 2^ /( n fc * 
It follows that 

P(||i?+||<t) = p(||i? + ||<E(||i? + ||)-|Q/^ A;X ' 5/16 



< 



-E(||i^ 



5/16 N 



(A + o(l))^2 

" ~ ~ (n-k\ ^ 1/8 ^ U ' 



{(3/(V)} 

where the final step follows from Chebychev's inequality We thus have 
implying that 

liminf P(EKR ||i2|| = t) > e~ A2 
The proof using p~ follows a similar path, yielding 

limsupP(EKR \\R\\ = t) < e~ A * 

n—>co 

and thus that 



P EKR 



\R\\=t,t=(A + o(l) h l2[ n k )/( n k k 
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With all three preliminary results derandomized, the proof of Theorem 1 is 
complete. □ 
To provide a numerical comparison, we note that when k = n 3 ^ 5 , the 
Erdos-Ko-Rado theorem yields 

fn-l\ exp{(2n 3 / 5 logn)/5(l + oil))} 
t > , \ ^ ±_ => p ( x o = 0=0, 



whereas Theorem 1 yields a threshold at \p2e k ' 1 l 2n ~ ^/2e nl/5 ^ 2 . 



3 The Distribution of X 

In this section, we use the Stein-Chen method of Poisson Approximation [2] 
to prove the following theorem. 

Theorem 2 Consider a family of k-element subsets of an n-element set, 
obtained by randomly and independently selecting each k-set with probability 
p. Let X represent as before the number of disjoint pairs in our selection of 
subsets. Then the distribution ofX can be closely approximated by a Poisson 

distribution with parameter X = |(") ( n ~ k h )p 2 when p = o 

Proof. One of several approximation theorems in [2] (Corollary 2.C.4) states 
that if we consider a sum Z = X]je/ h °f indicator random variables with 
E(Z) = A, and if for each j there exists a sequence of indicator variables, 
Jij, such that 

C(J lj :iel)= C(Ii :ie I 
and such that for all i ^ j, > Ii, then 

d TV (C(Z),Po(\)) < i^^(Var(Z)-A + 2]TP 2 (/, = !)), 

where drv represents the usual total variation distance and Po(A) the Poisson 
random variable with mean A. In other words, if a coupling exists such that 
the indicator random variables Ii and are positively related, then the total 
variation distance between the distribution of the random variable Z and a 




h = i). 



(i) 
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Poisson distribution with parameter A may be bounded solely in terms of the 
first two moments of Z . 

For our problem, we employ the following coupling that clearly satisfies 
(1): If Ij = 1, we let = ij Mi. If Ij — 0, we add one or both unchosen fc-sets 
to our collection by changing the associated coin flips as needed. Then, we 
set Jij = 1 if the addition of these fc-sets creates the selection of the iih pair 
of disjoint fc-sets to our collection of sets; it is obvious that Jjj > ij V? ^ j. 
Therefore, the above bound may be applied with A = E(X) = \ { n ^){ n ~ k k )p 2 
to yield 



A 

Since 

Var(Xo) = Vax^/j) 

= ^Var(/,) + 2 ^ Cov (/*/,-) 



1 fn\ (n — k 

< 



2\kJ\ k 

it follows that 



i<j 



rf TV (£(X„),Po(A))<2(\*\, + (l - 2(™ fc * : ))p 2 , 

establishing the result. □ 
Note that the threshold value for p in Theorem 1 does in fact fall within the 
domain of applicability of Theorem 2. 



4 Pairwise r-Overlapping Sets 

We are not motivated, in this section or the next, by combinatorial results 
such as the Erdos-Ko-Rado theorem. Instead our focus turns to the prob- 
abilistic nuances of pairwise intersection properties of randomly selected k- 
sets. In this section, we use methods similar to those employed in Sections 2 
and 3 to prove 
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(i) threshold results for the existence of, and 

(ii) distributional results for the numbers X r of, 
pairs of fc-sets that overlap in r elements, r > 1. 

Theorem 3 Let t denote the number of k-sized sets chosen at random from 
an n-element set. Let X r be the number of pairs of sets in the chosen family 

which overlap in exactly r elements. Then with t = y^^)/ (^) (^Zr) > 

( 1 i/*«t 
P(X r = 0)^> I i/f »t 

( e~ A2 ift = (A + o(l))t 

as n, k — > oo ; provided k ^> k/u; k = o(n). 

Proof. Again, we use the Janson inequality. This time, we let B, L be the 
event that both members of the ith pair of subsets which overlap in exactly r 
elements are among the selected t subsets. Let X r be the number of events Bi 
which occur, so that P {C\i e j BA = P(X r = 0). Choose a random collection 
R of /c-sized subsets with E(||i?||) = t by independently selecting each k-set 
to be in our ensemble with probability p — t/ (£). We thus have 



np(5 t ) = (i-p 2 )Ki)e)t:), 



iei 

so that the lower Janson inequality yields 

1 jn\ jk\ In— k\ 2 



ex P \ _ 2(S(X:Jy i < (1 _ p2) ia)Q ( -) < p / ) 



16/ 



and thus to the conclusion that P(X r = 0) -> 1 when p = o (J) (™:Jf)) 

or equivalents, when t = o (^2©/Q (£*)). 

Note next that // = £(X r ) = Eiei P (Bi) = IQQty and that 
A < (J) (^1*) P 3 j so that the upper bound of Janson's inequality yields 



p o< 



< 



cxp 



p2 
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and thus to the conclusion that 



2( n ) ( n ) 

" « * ^ n ,k^,n-K F ( X r = 0) -> 0. 



/fc\ fn—k\ — 2( k \ ( n ~ k \ 

\r/ \k—rJ \rj \k—rJ 

Assuming, as before, that k 2 3> n and k <C n, in order to prove that the 
above range for t is a valid one we must prove that 



n-k\ ' 




or equivalently that 

t 

Now the above is simply the "hypergeometric" probability function (making 
k without replacement selections from a drawer with k white and n — k black 
socks). The probability of drawing r white socks is maximized around the 
mean value of r = k 2 /n, and we thus need to show that (t) ClZ k ) ^ (t) when 
r = k 2 /n. This is confirmed below using Stirling's formula and an auxiliary 
result on Poisson approximation: By Theorem 6. A in [2], the total variation 
distance between the distribution of our hypergeometric random variable W 
and a Poisson distribution with the same mean k 2 /n satisfies 

Ik 

d TV (C(W),Po(k 2 /n)) < -, 

so that 

Po(k 2 /n, k 2 /n) - 3k/n < ¥{W = k 2 /n) < Po(k 2 /n, k 2 /n) + 3k/n. 
Stirling's formula yields Po(k 2 /n, k 2 jn) ~ J^~ k i so that f° r r = k 2 /n, 

U (fc-r) , 3* 

- 

implying that 

— ^ >0 (n,k->oo). 
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We conclude by monotonicity that 

P(X r = 0) -> when t > 



/fe\ /n-fe\ 
Vrv \k—rj 

Finally if we let i = (A + o(l))y / 2(")/g) the lower and upper bounds 

of Janson's inequality together yield P(X r = 0) — > e~ A2 (n, — > oo). Deran- 
domization of these preliminary results follows as in the proof of Theorem 
1. We only provide details for the last case, viz., when the actual number of 

£;-sets chosen is t = (A + o(l)) ^2 (£)/(*) (£;*). We know that when 



P=(A + o(l)) 1 / (2) 



P(X r = 0) — > e a2 . We define, again quite arbitrarily, 




where t = (A + o(l))y pyp=^ , and note that both p + and p satisfy (2). 
Using first p + to yield a random collection R + , we see that 

m + )=p + ( n k 




Jo A ( ^ 
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Also, 

Var(i?+) = E(i?+)(1 - p+) ~ V2A[ — ^— 



and thus 

p(||/? + ||<t) = P| ||/?-||<E||/?-||- 



< P | ||i? + || -E||i? + || |> 




V2A 

< r^O, 



fk\ /n-k\ 
\rj \k—r) 



where the final step is true by Chebychev's inequality. We thus have 
so that 

liminfP(X r = | \\R\\ = t) > e~ A2 . 
The proof for p~ follows similarly, yielding 

limsupP(X r = | \\R\\ = t) < e~ A \ 

and consequently that 



P(X r = | \\R\\= t ,t=(A + o(l))J 71 ^ J -)^e-" 1 



2(3 

/k\ /n-k\ 
\rJ \k—r) 

as required. □ 



Theorem 4 Consider a family of t subsets, each of size k, taken from an 
n-element set. Let X r be the random variable which represents the number 
of pairs of chosen subsets which overlap in exactly r elements. Then the 
distribution of X r can be closely approximated by a Poisson distribution with 

X r = E(X r ) = \ © Q (- V wh ™ P = ( efibj ) ■ 
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Proof. Again we use the Stein-Chen method. Consider the following cou- 
pling: If Ij = 1, i.e., if the jth pair of /c-subsets that overlap in r elements 
is selected, we let Jy = Ij Vi. If Ij = 0, we add one or both unchosen 
/c-sets to our collection by changing the coin flips as needed. Then, we set 
Jij = 1 if the addition of these /c-sets creates the selection of the ith pair of 
/c-sets which overlap in r elements to our ensemble. Since > U Wi ^ j, 
we may apply the same Stein-Chen approximation theorem as before. Note 
that A r = E(X r ) = Kl) Q [ n k Z*y, so that 

d TV (C(X r ),Po(\ r )) < 1 ~ 6 ^ (Var (X r ) - A r + 2 ^ P 2 (Ij = 1)) 

< Var ^ -l + 2p 2 . (3) 

As before, we see that 

-w4(i)C)(r-:)^- 4 )«G)(i)C) 2 (i::) 2 ^- 4 ) 

so that (3) yields 

4v m p (v)) < 2 (j) (» : *) P + (i - 2 (j) g : *)) p», 

as needed. □ 



5 The Joint Distribution of Xi, X2, Xb 



Theorem 5 The joint distribution of (X , X 2 , ■■■X b ), can be approximated 
by b independent Poisson distributions, provided b is "not too large" and 
provided the probability p of choosing any particular k-set satisfies 



( 



p = o 



\ 



\ 



©(EU, fe )CG)) J 



In other words, the total variation distance dxv between these two distribu- 
tions satisfies 

b 

d TV (jC(X ,X 1 ,...,X b ),]JPo(\j)) < e nAk 



3=0 
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where e n ^ ^ as n, k — > oo. 

Proof. First we note that the restriction on p satisfies the p-requirements 
for each individual Poisson approximation. This can be seen by noting that 
for each r = 0, 1, . . . , b, 




Consider the indicator variables where the (j, i) th pair is the i th pair of 
subsets which overlap in exactly j elements. If J Cjji) = 1, which is to say that 
the (j, i) th pair was chosen, then we let J{ 3 r i% = I^, a )- If I<j,i) = then we 
choose the (j, i) th pair of subsets. If = 1 after these additional choices, 
we set J^l) = 1. 

Since we have found a coupling which satisfies 

Theorem 10. J of 2] yields, with N 3 = |g) g) 

b 

d rv (£(Xo,Xi,...,X 6 ), n p °( A j)) < e "Afc, 

j=o 

where 



b Nj 



e n , b , k = EEi p2 ^-> = ^ + p ( J w> = E P ( J w-) ^ J (^)) 

j=0 i=l 



We thus have 



6 Nj 



e n>b , k < EE E P ( J C-) = n ^ = n J V>% = V 



3=0 i=l I 

6 7V : 



C,9,a)^0',i) 



< 



3=0 i=l 



P 



2p 



O<0<b;l<a<-W8 
(£,a)#(j,i);|(a,/3)n(j,i)l=l 
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Clearly this quantity tends to zero when 




which was the stated restriction on p. □ 
Discussion. It has been shown in Theorem 5 that a multivariate Poisson ap- 

proximation is valid for C(X , . . . , X b ) when p « ( © (^ =0 (*) (^J) ) 

This is a stronger condition, naturally, than those obtained in Theorem 4 for 
the univariate Poisson approximations for C(X r ). We need to verify, how- 
ever, that the threshold for multivariate Poisson approximation occurs at a 
level that is larger than the threshold for the EKR property; in other words 
we must have 

(2\ _1 / 3 
(3 (gC) »)) - 

The reason for imposing this requirement (we are not required to do so) 
is that one wishes to compute multivariate probability approximations for 
quantities involving all the XjS; < j < b, and, moreover, one would like to 
be able to meaningfully incorporate threshold situations into the multivariate 
approximation. In practice, given the value of k, (4) defines a condition 
that tells us how large b may be before a Poisson approximation becomes 
unrealistic. In general, the larger k is, the larger b is allowed to be. This may 
be best seen by reexpressing (4) as 
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